import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.patches as mpatches
import seaborn as sns
from plotly import graph_objects as go
import plotly.express as px
from datetime import datetime
import math as math
import scipy
from scipy.stats import norm
from scipy import stats as st
import sys
import warnings
if not sys.warnoptions:
warnings.simplefilter('ignore')
This analysis covers user behavior at an online retailer.
The relevant data encompass three datasets: one for grouping participants in the AB tests, one for events that users of the website engaged in and one for enrollment information about each participant.
The AB test assesses whether users improve their conversion rates within 14 days of their first visit to the site. A is the control group. B is the test group, comprised of users exposed to an improved recommendation system.
After initial analysis of the data, there are analyses of user activity and activity by date, as well as conversion analysis and statistical analysis of the AB test.
participants=pd.read_csv('final_ab_participants_us.csv')
participants.info()
participants.describe()
The AB test for analysis concerns only the Interface EU Test for which enrollment and the test was completed. So all preprocessing and analysis will focus on the relevant rows of the dataset.
#creating df for ab test participants
eurtest=participants[participants.ab_test=='interface_eu_test']
eurtest.shape
eurtest.describe()
eurtest.sample()
eurtest.user_id.nunique()
eurtest[eurtest.isna().any(axis=1)]
eurtest[eurtest.duplicated()]
There are no nulls nor duplicates in the EU test dataset.
The testing groups are not equal. There are 95 more A group members than members in group B.
That represents less than 1 percent of all testing participants, so the analysis and testing will continue with this difference in place.
Unequal groupings in this case will not increase false positives, but may increase the risk for errors of omission, false negatives resulting from not rejecting false null hypotheses.
events=pd.read_csv('final_ab_events_us.csv.csv',parse_dates=['event_dt'])
events.info(memory_usage='deep')
events[events.duplicated()]
events.isnull().sum()
events.user_id.nunique()
There are nearly quadruple the amount of events as unique participants in the AB testing dataset.
The nulls in the details column reflect actions for which money was not listed. As 3 out of 4 events didn't involve a dollar figure, the amount of nulls is logical and will remain as is.
#combining user test data with event df
eurtestevent=pd.merge(eurtest, events, how='left')
eurtestevent.isnull().sum()
eurtestevent=eurtestevent[eurtestevent['event_name'].notnull()]
eurtestevent.user_id.nunique()
eurtestevent[eurtestevent.duplicated()]
eurtestevent.groupby('user_id')['event_name'].unique().count()
eurtestevent.event_name.value_counts()
Purchases exceed those who visited the product cart page, so there is clearly another method of purchase on the site, such as a Buy It Now button.
Although login isn't necessary to engage in any of the other actions, that still accounts for most of the site events and product page views are the second most active event.
userstart=pd.read_csv('final_ab_new_users_us.csv',parse_dates=['first_date'])
userstart.info()
userstart.sample()
userstart.isnull().sum()
userstart[userstart.duplicated()]
userstart.user_id.nunique()
userstart.first_date.max()
There are sextuple as many users who started as unique users who engaged in events.
Their enrollment spans more than two weeks after launch date. That is a not particularly effective approach for an AB Test with the intent of assessing behavior over a two-week period in December, since many of those enrolled will not have been enrolled for two weeks.
On the up side, user activity likely peaks when they are most interested and that is when they invest the time to login or view product pages, as most of the events reflect. This data analysis will explore the question of how active users are over the amount of days that have passed since their enrollment.
#creating a df to help understand whether device usage on the site differs among test groups
devicecheck=pd.merge(eurtest,userstart)
#starting to create a df for funnel and churn analysis by paring down features
userdateregion=userstart.drop('device',1)
#selecting EU users for EU AB Test df for analysis
userdateregioneu=userdateregion[userdateregion.region=='EU']
#merging EU user start dates with EU AB test and event df
eurtestevent=pd.merge(eurtestevent, userdateregioneu, how='left')
eurtestevent.isnull().sum()
eurtestevent[eurtestevent.duplicated()]
eurtestevent.user_id.nunique()
eurtestevent.ab_test.unique()
eurtestevent.region.unique()
#dropping columns with all the same value that is known
eurtestevent=eurtestevent.drop(['region','ab_test'],1)
eurtestevent.sample()
eurtestevent.first_date.max()
eurtestevent.event_dt.max()
eurtestevent.first_date.min()
eurtestevent.event_dt.min()
eurtestevent.info()
Datatype for user_id remains a string, because user_id contains letters. The two dates in this dataframe were assigned datetime format. Group and event names remain strings that showcase their actual content. Details remains float type because it contains a preponderance of null values and its number values are dollar figures showing cents.
A comparison of dates ensured that the first event time is subsequent to initial enrollment dates and that the last event date is a week after the last enrollment date. That is still not enough time to give everyone who enrolled two weeks to test their conversion rates.
All four events for both test groups have a healthy share of user activity to analyze conversion and provide a base for proportion hypothesis testing.
Many aspects of the data will be considered at this stage to ensure the integrity of the data being tested.
df_a=eurtestevent[eurtestevent.group=='A']
df_b=eurtestevent[eurtestevent.group=='B']
bgroupby=df_b.groupby('user_id')['user_id'].unique()
bgroupby.columns = ['user_id']
agroupby=df_a.groupby('user_id')['user_id'].unique()
agroupby.columns = ['user_id']
bgrouplist=bgroupby.to_list()
len(bgrouplist)
agrouplist=agroupby.to_list()
len(agrouplist)
templist = [x for x in agrouplist if x not in bgrouplist]
len(templist)
templistb = [x for x in bgrouplist if x not in agrouplist]
len(templistb)
The above test found no group users are in the other group user list.
devicecheck_b=devicecheck[devicecheck.group=='B']
devicecheck_a=devicecheck[devicecheck.group=='A']
devicecheck_a.device.unique()
devicecheck_a.device.value_counts()
device_alist=devicecheck_a.device.value_counts().to_list()
device_blist=devicecheck_b.device.value_counts().to_list()
b_deviceshareof_a = [b/a for b,a in zip(device_blist, device_alist)]
b_deviceshareof_a
trace1 = go.Bar(
x = devicecheck_b.device.unique(),
y = devicecheck_b.device.value_counts(),
name='B',
text = device_blist,
textposition='outside',
marker=dict(
color='sandybrown')
)
trace2 = go.Bar(
x = devicecheck_a.device.unique(),
y = devicecheck_a.device.value_counts(),
name='A',
text = device_alist,
textposition='outside',
marker=dict(
color='skyblue')
)
data = [trace1, trace2]
layout = go.Layout(barmode='group',height=500,
title='Test Group Users by Device',xaxis_title="Device", yaxis_title="Users")
fig = go.Figure(data=data, layout=layout)
fig.show()
Group B users exceed those of Group A on Macs and Android devices. Group A users exceed those of Group B users on PCs and iPhones.
The percentage by which B users exceed A users on Androids and Macs is a slimmer margin than the percentage of Group A users on PC and iPhones. Group B users account for about 95% of the numbers of Group A using PCs and iPhones.
The balance of users among devices is equivalent enough as to not significantly denigrate the integrity of the test.
eurtestevent[eurtestevent.event_name=='purchase'].head()
spend_sum=eurtestevent[eurtestevent.event_name=='purchase'].groupby('user_id')['details'].sum()
eurtestevent[eurtestevent.event_name=='purchase'].groupby('user_id')['details'].sum().describe()
ax=sns.distplot(spend_sum,color='skyblue',fit=norm,kde_kws={
'color':"#3498DB",'lw':1.5,"label":'KDE'},hist_kws={'histtype':'stepfilled','lw':1,'alpha':.8},bins=40)
ax.set_title('Probability Density Distribution of Purchases for each User')
ax.set_ylabel('probability')
ax.set_xlabel('dollar expenditure');
The user who spent the most, a whopping \$1115, spent 223 times as much as the lowest spender at five dollars. That high spender and the entire top quartile, exceeding \\$110, skew the distribution for spending on this site far to the right.
print("Average number of events per user:",eurtestevent.event_name.count()/eurtestevent.user_id.nunique())
eventsbyuser=eurtestevent.groupby(['user_id'])['event_name'].count().reset_index(
).sort_values(by='event_name',ascending=False)
#Distribution of the number of events by individual users
eventsbyuser.plot(kind='hist',ec='black',title='Events by User Distribution')
eventsbyuser.describe()
On average, users do between seven and eight activities on the site, an average that is pulled upward by the third quartile doing between six and 10 activities each and the upper quartile mostly doing 10 to 20 activities each.
The distribution for user activity is right skewed because a quarter of all users do more activities than the majority who do less than 10 each.
eventnames=eurtestevent.event_name.unique()
eventnames
for i in eventnames:
print("Number of events for",i,":",eurtestevent[eurtestevent.event_name==i].shape[0])
for i in eventnames:
print("Number of events for A group",i,":",df_a[df_a.event_name==i].shape[0])
for i in eventnames:
print("Number of events for B group",i,":",df_b[df_b.event_name==i].shape[0])
for i in eventnames:
print("Average number of times each A group user did event",i,round(df_a[
df_a.event_name==i].shape[0]/df_a.user_id.nunique(),2),"while users in B group did it",round(df_b[
df_b.event_name==i].shape[0]/df_b.user_id.nunique(),2))
eurtestevent.first_date.describe()
usersbystartdate=eurtestevent.groupby(eurtestevent['first_date'].dt.date)['user_id'].nunique()
userstartdatedf=usersbystartdate.to_frame().reset_index()
userstartdatedf.columns=['start_date','users']
userstartdatedf
sortedstart=userstartdatedf.sort_values(by='start_date',ascending=False)
#unique users per date
usersbydate=eurtestevent.groupby(eurtestevent['event_dt'].dt.date)['user_id'].nunique()
userdatedf=usersbydate.to_frame().reset_index()
userdatedf.columns=['date','users']
userdatedf
sorteduserdatedf=userdatedf.sort_values(by='date',ascending=False)
print(np.mean(userdatedf.users.head(-1)))
print(np.median(userdatedf.users.head(-1)))
mean_users_perdate=round(np.mean(userdatedf.users),2)
print(mean_users_perdate)
print(np.median(userdatedf.users))
quarterpercentile_userenrollment= np.percentile(userstartdatedf.users,25)
quarterpercentile_userenrollment
labels=[]
labels= sorteduserdatedf['users'].apply(lambda x: sorteduserdatedf.users)
color=[]
color = sorteduserdatedf['users'].apply(lambda x: 'springgreen' if x > mean_users_perdate else 'red')
color2=[]
color2 = sortedstart['users'].apply(lambda x: 'palegreen' if x > quarterpercentile_userenrollment else '#F778A1')
plt.figure(figsize=(25,15))
eventusers=plt.hlines(y=np.arange(0,24),xmin=0,xmax=sorteduserdatedf['users'],color=color,linewidth=13)
startusers=plt.hlines(y=np.arange(6,23),xmin=0,xmax=sortedstart['users'],color=color2,linewidth=13)
plt.plot(sorteduserdatedf['users'], np.arange(
0,23), "*",markersize=29, markeredgewidth=.5, markeredgecolor='gold', markerfacecolor='#FFD801',label=labels)
plt.plot(sortedstart['users'], np.arange(
6,23), "8",markersize=18, markeredgewidth=.4, markeredgecolor='yellow', markerfacecolor='#FFE87C',label=labels)
plt.yticks(np.arange(0,24), sorteduserdatedf['date'],fontsize=15)
plt.xticks(fontsize=15)
plt.ylabel("Date",fontsize=17)
plt.xlabel("Users",fontsize=17)
legendiction = {'above average user events' : 'springgreen', 'top 75% enrollment' : 'palegreen','below average user events': 'red', 'below 25th percentile user enrollment' : '#F778A1'}
useractlegend = []
for key in legendiction:
data_key = mpatches.Patch(color=legendiction[key], label=key)
useractlegend.append(data_key)
plt.legend(handles=useractlegend,numpoints=4,fontsize='x-large',title='User Activity Legend',title_fontsize='x-large')
plt.title("Users By Start and Event Date",fontsize=20)
plt.show();
For the entire two weeks leading up to Christmas, the amount of active users exceeded the mean. It only dropped below the mean again on Christmas day and continued to drop to virtually nothing by the end of the month.
User participation only exceeded its mean value on 14 days, all but two days that were start dates exceeding the 25 percentile for enrollment of users participating in A/B test. On three dates that enrollment fell below that 25th percentile, participation dropped below the mean. The launch date is the only date that user enrollment exceeded user participation.
Enrollment stopped on the 23rd and participation continued to be strong that day and the next, Christmas Eve. Starting Christmas day, participation plummeted.
The primary unusual aspect to this dataset is the fact that the A/B test was launched in December to assess website conversion, rather than anything related to Christmas itself. Consumer behavior across industries and retail is significantly impacted by holiday shopping, weather and in 2020, the pandemic's impact on product supply. (In December 2020, many retailers, such as Lego.com, are out of nearly their entire stock and not accepting orders for items not in stock.) December factors need to be taken into account when measuring any other retail data, as in the case of this dataset.
#portion of users who performed each of the actions.
#sorted by users who did each action at least once
print("Users who performed each action, sorted by users who did each action at least once.")
eachaction=eurtestevent.groupby(['event_name'])['user_id'].nunique().sort_values(ascending=False)
eachaction.columns=['times']
eachaction
print("These are the ratios of users who did each action, conversion, sorted by frequency.")
userratios=eachaction/eurtestevent.user_id.nunique()
userratios=userratios.to_frame()
userratios=userratios.reset_index()
userratios=userratios.style.format({
'user_id': '{:,.2f}'.format
})
userratios
atleastonce=eurtestevent.groupby(['user_id','event_name'])['event_dt'].count().reset_index()
print("This is a table of how many times each user did this event.")
atleastonce.columns=['user_id','event','times']
atleastonceframe=pd.DataFrame(atleastonce)
atleastonceframe.head()
totalevents_byuser=atleastonceframe.groupby('user_id')['times','event'].sum().reset_index()
totalevents_byuser.head()
totalevents_byuser.describe()
purchasedf=atleastonce[atleastonce.event=='purchase']
purchasedf.head()
purchasedf.describe()
timesvalues=purchasedf.times.value_counts
x_values = pd.Series(range(0,len(purchasedf)))
# random state for reproducibility
np.random.seed(500)
# areas and colors
N = 3713
rn = 2 * np.random.rand(N)
theta = np.pi * np.random.rand(N)
area = 150 * rn**2
colors = theta
sns.set_style('darkgrid')
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='polar')
c = ax.scatter(x_values, purchasedf['times'], c=colors, s=area, cmap='cool', alpha=0.75)
plt.title("Purchase events per User")
x_values = pd.Series(range(0,len(totalevents_byuser)))
# random state for reproducibility
np.random.seed(500)
# areas and colors
N = 10850
rn = 2 * np.random.rand(N)
theta = np.pi * np.random.rand(N)
area = 150 * rn**2
colors = theta
sns.set_style('darkgrid')
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='polar')
c = ax.scatter(x_values, totalevents_byuser['times'], c=colors, s=area, cmap='rainbow_r', alpha=0.75)
plt.title("Total events per User");
print("These are the events done by users who each did more than one event.")
multipletimes=atleastonce[atleastonce.times>=2].groupby(['event'])['user_id'].nunique().sort_values(ascending=False)
multipletimes
print("These are the ratios of users who did more than one event")
print("So out of 34% who made an initial purchase, 91.7% returned to make another purchase.")
round(multipletimes/eachaction,3)
Of the two-thirds who visited the product page, 92% returned to visit the product page.
Of the one-third the third who made a purchase, 92% returned to make another purchase. The same percentages are true for the product cart as for the purchases.
Each activity can stand on its own because no single activity is required to do another activity. That said, double the ratio of users visit product pages and there is no doubt that the action of visiting a product page contributes to the knowledge and desire of a user to consider filling a product cart or making a purchase.
usersbyeventpop=eurtestevent.groupby('event_name').agg({'user_id':'nunique'}).sort_values('user_id',ascending=False).reset_index()
usersbyeventpop.columns=['events','users']
usersbyeventpop['conversion']=round(usersbyeventpop['users']/(eurtestevent['user_id'].nunique()),2)
usersbyeventpop
re_sort_index=[0, 2, 1, 3]
#creating a funnel
funnel=eurtestevent.groupby(['event_name'])['user_id'].nunique().reset_index()
fun_order = pd.Categorical(['login', 'product_page','product_cart' ,'purchase'],
ordered=True)
funnel_sorted=funnel.sort_index(level=fun_order)
funnel_sorted=funnel_sorted.reindex(re_sort_index)
funnel_sorted
funnel_sorted['percent_change']=funnel_sorted['user_id'].pct_change()
funnel_sorted
group_a=eurtestevent[eurtestevent.group=="A"].groupby(['event_name','group'])['user_id'].nunique().reset_index(
).sort_values(by='user_id',ascending=False)
group_a_sorted=group_a.reindex(re_sort_index)
eventlist_a=group_a_sorted.event_name
user_a_list=group_a_sorted.user_id
group_b=eurtestevent[eurtestevent.group=="B"].groupby(['event_name','group'])['user_id'].nunique().reset_index(
).sort_values(by='user_id',ascending=False)
group_b_sorted=group_b.reindex(re_sort_index)
eventlist_b=group_b_sorted.event_name
user_b_list=group_b_sorted.user_id
funfig= go.Figure()
funfig.add_trace(go.Funnel(
name='A',
y=eventlist_a,
x=user_a_list,
text='A',
textposition='inside',
textinfo='text+value+percent initial+percent total',
marker = {"color": ['lightcyan','skyblue','dodgerblue','steelblue']}))
funfig.add_trace(go.Funnel(
name='B',
y=eventlist_b,
x=user_b_list,
textposition='inside',
text='B',
textinfo='text+value+percent initial+percent total',
marker = {"color": ['sandybrown','burlywood','rosybrown','peru']}))
funfig.update_layout(title="Interactive Funnel for Both Experiment Groups")
#How many users in each group do each event
pivot= eurtestevent.pivot_table(
index='event_name', values='user_id', columns='group', aggfunc=lambda x: x.nunique()).reset_index()
pivot
pivot=pivot.reindex([0, 2, 1, 3])
pivot
eurtestevent.info()
usersbothdate=eurtestevent.groupby([eurtestevent.event_dt.dt.date,'first_date'])['user_id'].count()
usersbothdate=usersbothdate.to_frame().reset_index()
usersbothdate.columns=['event_date','start_date','users']
usersbothdate.event_date=usersbothdate.event_date.astype('datetime64[ns]')
usersbothdate['life']=(usersbothdate.event_date-usersbothdate.start_date)
usersbothdate['life_num'] = usersbothdate['life'].dt.days.astype('int16')
usersbothdate.head(2)
lifepivot= usersbothdate.pivot_table(index='life',
values='users',aggfunc={'users':[np.sum,np.mean]}).reset_index()
lifepivot.columns=['life','avg_users','sum_users']
lifepivot.avg_users=round(lifepivot.avg_users,1)
lifepivot.head(15)
lifepivot[lifepivot.life>'14 days'].sum_users.sum()
lifepivot['life_num']=lifepivot.life/np.timedelta64(1,'D')
lifepivot.life_num.values
lifecolorscale=["springgreen", "yellow","orange","red"]
reverse=lifepivot.sort_values(('life_num'), ascending=False)
fig = px.scatter(lifepivot, x="life_num", y="sum_users", color='life_num', color_continuous_scale=lifecolorscale,
hover_data=[
'life_num','sum_users','avg_users'],title="Users by days of Life: Hover for details")
fig.update_traces(marker=dict(size=(reverse.life_num.values+15), line=dict(width=2,
color='lemonchiffon')), selector=dict(mode='markers'))
fig.update_layout(coloraxis_colorbar=dict(
title="Life in days"),xaxis=dict(title='Life in Days'),yaxis=dict(title='Sum of Users'))
fig.show()
This shows that on the first day of enrollment, there are the most active users. The amount of user activity dwindles as the amount of time passes from their start date.
The lowest amount of users comes to the site on the greatest amount of days subsequent to their enrollment.
pd.DataFrame(lifepivot).head()
lifepivot.life_num= lifepivot.life/np.timedelta64(1,'D')
eventsbydate=eurtestevent.groupby([eurtestevent.event_dt.dt.date,'event_name'])['user_id'].count()
eventsbydate=eventsbydate.to_frame().reset_index()
eventsbydate.columns=['event_date','event','users']
eventsbydate.event_date = pd.to_datetime(eventsbydate.event_date)
eventsbydate.event_date = eventsbydate.event_date.dt.date
eventsbydate.sample(3)
eventsbydate.head()
sortedeventsbydate=eventsbydate.sort_values(by='event_date',ascending=False)
colorbyevent={'login':'cyan','product_page':'springgreen','product_cart':'orange','purchase':'magenta'}
fig = px.scatter(eventsbydate, x="users", y="event_date", size='users',color='event',color_discrete_map=colorbyevent,
category_orders={"event": ["login", "product_page", "product_cart", "purchase"]},
hover_data=['event','event_date'],title="Events by Date: Hover for details")
fig.update_traces(marker=dict(line=dict(width=2,
color='lemonchiffon')),
selector=dict(mode='markers'))
fig.show()
The greatest increases in activity over that strong period are in the two most popular activities, log in and product page visits. Purchases nearly consistently exceed product cart activity.
eurtestevent.first_date.value_counts()
#How many users in each group do each event
pivot= eurtestevent.pivot_table(
index='event_name', values='user_id', columns='group', aggfunc=lambda x: x.nunique()).reset_index()
pivot=pivot.reindex([0, 2, 1, 3])
pivot
In exploratory analysis, it became clear that the events are not entirely intertwined nor required. In that way, there can be more users who make purchases than fill product carts. Although login and product page views are not a required step, both of these activities are so active, that they can and likely do make an impact on the likelihood of the cart and purchase activities.
Group A users on average did every event except product cart slightly more often than group B users. Nonetheless, the total activities for each group were fairly equivalent for all categories except for purchasing, in which group A had 108% of the purchase events that group B had.
Although most users made four or less purchases and spent less than 20 dollars, distributions for both purchase frequency and amount skews to the right reflecting an active top quartile with 4 to 9 purchases each and a top spender quartile that spends more than \$110 each and as much as \\$1115.
The timing of Christmas during this data gathering has a strong impact on events. All activity in all categories tapers down after Christmas Eve. Purchase events dwindle down to less than half the amount as during the ten days leading up to Christmas.
Primary question for investigation: Is there a significant statistical difference between test groups in conversion - the proportion of all users in the test who participate in each event?
First there will be AA testing to ensure that the experiment is statistically fair. Ideally, the AA test will show no difference in conversion. If the AA test shows a difference, then that would point to a problem in the sample. If it does not show a difference, then that points to the test having been implemented correctly.
df_a_shuffle = df_a.sample(frac = 1)
half_df_a=df_a_shuffle.shape[0]/2
half_df_a=int(half_df_a)
half_df_a
def makenew(df, rows) :
top = df.head(rows)
bottom = df.tail(len(df)-rows)
return top, bottom
# Split dataframe into top and bottom
top_a_shuffle, bottom_a_shuffle = makenew(df_a_shuffle, half_df_a)
pivot_a= top_a_shuffle.pivot_table(
index='event_name', values='user_id', columns='group', aggfunc=lambda x: x.nunique()).reset_index()
pivot_a=pivot_a.reindex(re_sort_index)
pivot_a
pivot_a2= bottom_a_shuffle.pivot_table(
index='event_name', values='user_id', columns='group', aggfunc=lambda x: x.nunique()).reset_index()
pivot_a2=pivot_a2.reindex(re_sort_index)
pivot_a2
def check_hypothesis_aa(group1,group2,event,alpha=0.05):
successes1=pivot_a[pivot_a.event_name==event][group1].iloc[0]
successes2=pivot_a2[pivot_a2.event_name==event][group2].iloc[0]
trials1 = top_a_shuffle[top_a_shuffle.group==group1]['user_id'].nunique()
trials2 = bottom_a_shuffle[bottom_a_shuffle.group==group2]['user_id'].nunique()
#success proportion in the first group:
p1= successes1/trials1
#success proportion in the second group:
p2= successes2/trials2
#success proportion in the combined dataset:
p_combined = (successes1 + successes2) / (trials1 + trials2)
#the difference between the datasets' proportions
difference = p1 - p2
#calculating the statistic in standard deviations of the standard normal distribution
z_value = difference / math.sqrt(p_combined * (1-p_combined) * (1/trials1 + 1/trials2))
#setting up the standard normal distribution (mean 0, standard deviation 1)
distr = st.norm(0,1)
p_value = (1-distr.cdf(abs(z_value))) * 2
print(
"H0: There is no statistical difference in conversion for",event,
"\nH1: There is a statistically significant proportional difference in conversion rate for",event,'\n')
print(f'p_value: {p_value:.6f}')
print("This test for Event:",event)
if (p_value < alpha):
print("Rejecting the null hypothesis: there is a significant difference between the proportions.")
else:
print("Failed to reject the null hypothesis: there is no reason to consider the proportions different." )
check_hypothesis_aa('A', 'A', 'product_cart',alpha=0.05)
check_hypothesis_aa('A', 'A', 'product_page',alpha=0.05)
check_hypothesis_aa('A', 'A', 'purchase',alpha=0.05)
check_hypothesis_aa('A', 'A', 'login',alpha=0.05)
AA testing shows no difference in conversion. The experiment is statistically fair.
def check_hypothesis(group1,group2,event,alpha=0.05):
successes1=pivot[pivot.event_name==event][group1].iloc[0]
successes2=pivot[pivot.event_name==event][group2].iloc[0]
trials1 = eurtestevent[eurtestevent.group==group1]['user_id'].nunique()
trials2 = eurtestevent[eurtestevent.group==group2]['user_id'].nunique()
#success proportion in the first group:
p1= successes1/trials1
#success proportion in the second group:
p2= successes2/trials2
#success proportion in the combined dataset:
p_combined = (successes1 + successes2) / (trials1 + trials2)
#the difference between the datasets' proportions
difference = p1 - p2
#calculating the statistic in standard deviations of the standard normal distribution
z_value = difference / math.sqrt(p_combined * (1-p_combined) * (1/trials1 + 1/trials2))
#setting up the standard normal distribution (mean 0, standard deviation 1)
distr = st.norm(0,1)
p_value = (1-distr.cdf(abs(z_value))) * 2
print(
"H0: There is no statistical difference in conversion for",event,
"\nH1: There is a statistically significant proportional difference in conversion rate for",event,'\n')
print(f'p_value: {p_value:.6f}')
print("This test for Event:",event)
if (p_value < alpha):
print("Rejecting the null hypothesis: there is a significant difference between the proportions.")
else:
print("Failed to reject the null hypothesis: there is no reason to consider the proportions different." )
check_hypothesis('A', 'B', 'product_cart',alpha=0.05)
check_hypothesis('A', 'B', 'product_page',alpha=0.05)
check_hypothesis('A', 'B', 'purchase',alpha=0.05)
check_hypothesis('A', 'B', 'login',alpha=0.05)
AB testing shows no difference in conversion for 3 out of four events.
For the most important event, purchase, AB testing shows that there is a significant difference in proportions.
All told, the A/B Test was a great success. The users who enrolled in the test showed a great proclivity to return to the site and repeat their actions, providing significant activity records to conduct a robust test of their conversion.
The beauty of this site is the independence of actions, so that no one action is contingent upon another. Nonetheless, there is no doubt that the profusion of logins and product page views in some way contribute to the third of the users who have product cart activity and the third of the users who make purchases.
The hypothesis testing found a significant difference between testing groups in conversion for purchasing activity. The A/B test helped the company to narrow down the impact of the change tested and hone in on this one, critical activity. That gives the company a fruitful realm of research ahead, to clarify how best to leverage that group's experience to boost purchases by the overall user base.